Research Statement — Dani Yogatama

نویسنده

Dani Yogatama

چکیده

I design algorithms for intelligent processing of natural language texts—for example, to extract factual information into a structured database (e.g., extracting headquarters locations, CEOs, and phone numbers of companies from text into a database) or to predict real-world events from text (e.g., scientific trends, disease outbreaks). These applications require models of text that scale to large datasets. I advance machine learning (ML) methods for natural language processing (NLP), focusing on large-scale sparse models that leverage expert-informed domain knowledge. In my research, I seek to answer the following questions:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.

متن کامل

A Sparse and Adaptive Prior for Time-Dependent Model Parameters

We consider the scenario where the parameters of a probabilistic model are expected to vary over time. We construct a novel prior distribution that promotes sparsity and adapts the strength of correlation between parameters at successive timesteps, based on the data. We derive approximate variational inference procedures for learning and prediction with this prior. We test the approach on two t...

متن کامل

Embedding Methods for Fine Grained Entity Type Classification

We propose a new approach to the task of fine grained entity type classifications based on label embeddings that allows for information sharing among related labels. Specifically, we learn an embedding for each label and each feature such that labels which frequently co-occur are close in the embedded space. We show that it outperforms state-of-the-art methods on two fine grained entity-classif...

متن کامل

Linguistic Structured Sparsity in Text Categorization

We introduce three linguistically motivated structured regularizers based on parse trees, topics, and hierarchical word clusters for text categorization. These regularizers impose linguistic bias in feature weights, enabling us to incorporate prior knowledge into conventional bagof-words models. We show that our structured regularizers consistently improve classification accuracies compared to ...

متن کامل

Predicting a Scientific Community's Response to an Article

We consider the problem of predicting measurable responses to scientific articles based primarily on their text content. Specifically, we consider papers in two fields (economics and computational linguistics) and make predictions about downloads and within-community citations. Our approach is based on generalized linear models, allowing interpretability; a novel extension that captures first-o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Research Statement — Dani Yogatama

نویسنده

چکیده

منابع مشابه

Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

A Sparse and Adaptive Prior for Time-Dependent Model Parameters

Embedding Methods for Fine Grained Entity Type Classification

Linguistic Structured Sparsity in Text Categorization

Predicting a Scientific Community's Response to an Article

عنوان ژورنال:

اشتراک گذاری